Temporal and Information Flow Based Event Detection from Social Text Streams
نویسندگان
چکیده
Recently, social text streams (e.g., blogs, web forums, and emails) have become ubiquitous with the evolution of the web. In some sense, social text streams are sensors of the real world. Often, it is desirable to extract real world events from the social text streams. However, existing event detection research mainly focused only on the stream properties of social text streams but ignored the contextual, temporal, and social information embedded in the streams. In this paper, we propose to detect events from social text streams by exploring the content as well as the temporal, and social dimensions. We define the term event as the information flow between a group of social actors on a specific topic over a certain time period. We represent social text streams as multi-graphs, where each node represents a social actor and each edge represents the information flow between two actors. The content and temporal associations within the flow of information are embedded in the corresponding edge. Events are detected by combining text-based clustering, temporal segmentation, and information flow-based graph cuts of the dual graph of the social networks. Experiments conducted with the Enron email dataset and the political blog dataset from Dailykos show the proposed event detection approach outperforms the other alternatives. Introduction Recently, social text stream data, such as weblogs, message boards, mailing lists, review sites and web forums, have become ubiquitous with the evolution of the web. We refer to a collection of text communication data that arrives over time as social text stream data, where each piece of text in the stream is associated with some social attributes such as author, reviewer, sender, and recipients. In the last few years, social text stream data has changed the way we communicate daily, the way we market businesses, and even the way political campaigns are conducted. For example, weblogs are now used not only by individual users for sharing information about their daily lives and thoughts, but are also used by large business corporations and political parties to Copyright c © 2007, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. 1http://www.cs.cmu.edu/∼enron/ http://www.dailykos.com release their latest products or new proposals. Usually, social text stream data arrives over time and each piece of the stream carries part of the semantics (e.g., information about real world events) (Kleinberg 2006). In this sense, social text streams are sensors of the real world. Social text streams generate large amounts of text data from various types of sources. The streams have rich content such as text, social networks, and temporal information. Efficiently organizing and summarizing the embedded semantics has become an important issue. Most text semantic analysis techniques mainly focused on TDT (Topic Detection and Tracking) (Krause, Leskovec, & Guestrin 2006; Yang, Pierce, & Carbonell 1998) for general text data (e.g., scientific papers, newswires, and corporation documents). However, the social text stream data is substantially different from general text stream data: (1) social text stream data contains rich social connections (between the information senders/authors and recipients/reviewers) and temporal attributes of each text piece; and (2) the content of text piece in the social text stream data is more context sensitive. That is, not only within the text piece content, the meaning of words are dependent from the context; but also the meaning of a text piece is dependent from the social actors (e.g., sender, author, recipient, and commenter) and other temporally correlated text pieces (e.g., the temporal and content information of previous text pieces). In this work, we propose to detect events from social text streams by exploring features in three dimensions: textual content, social, and temporal. In the literature, there are some existing works on semantic analysis of text stream data and social network data (McCallum & Huang 2004; Mei et al. 2006; Yang, Pierce, & Carbonell 1998). For example, different text content analysis techniques have been proposed to classify huge amount of emails into different topics or identify entities (McCallum & Huang 2004). Event detection algorithms have been proposed for newswires (Yang, Pierce, & Carbonell 1998) and blogs (Mei et al. 2006). Different communities can be extracted from the email communication network as well (Leuski 2004). However, most of them ignored either the social network information, or the temporal properties of the stream data. However, the embedded temporal and information flow pattern (communication pattern), and the social network relations in social text stream data can be used to improve the
منابع مشابه
Mining spatio-temporal information on microblogging streams using a density-based online clustering method
0957-4174/$ see front matter 2012 Elsevier Ltd. A doi:10.1016/j.eswa.2012.02.136 E-mail address: [email protected] Social networks have been regarded as a timely and cost-effective source of spatio-temporal information for many fields of application. However, while some research groups have successfully developed topic detection methods from the text streams for a while, and even som...
متن کاملEvent Detection and Visualization for Social Text Streams
In this paper, we propose to detect events from social text streams by exploring the content as well as the temporal, and social dimensions. We define the term event in the social text streams(e.g., blogs, emails, and Usenets) as a set of relations between social actors on a specific topic over a certain time period. We represent social text streams as multi-graphs, where each node represents a...
متن کاملWhat Is New in Our City? A Framework for Event Extraction Using Social Media Posts
Post streams from public social media platforms such as Instagram and Twitter have become precious but noisy data sources to discover what is happening around us. In this paper, we focus on the problem of detecting and presenting local events in real time using social media content. We propose a novel framework for real-time city event detection and extraction. The proposed framework first appl...
متن کاملEvent Detection in Social Streams
Social networks generate a large amount of text content over time because of continuous interaction between participants. The mining of such social streams is more challenging than traditional text streams, because of the presence of both text content and implicit network structure within the stream. The problem of event detection is also closely related to clustering, because the events can on...
متن کاملEvent based classification of Web 2.0 text streams
Web 2.0 applications like Twitter or Facebook create a continuous stream of information. This demands new ways of analysis in order to offer insight into this stream right at the moment of the creation of the information, because lots of this data is only relevant within a short period of time. To address this problem real time search engines have recently received increased attention. They tak...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007